Overview

Dataset statistics

Number of variables25
Number of observations20000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.8 MiB
Average record size in memory200.0 B

Variable types

NUM22
CAT2
BOOL1

Reproduction

Analysis started2020-11-05 01:15:16.013152
Analysis finished2020-11-05 01:18:10.262330
Duration2 minutes and 54.25 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

BILL_AMT2 is highly correlated with BILL_AMT1 and 1 other fieldsHigh correlation
BILL_AMT1 is highly correlated with BILL_AMT2High correlation
BILL_AMT3 is highly correlated with BILL_AMT2 and 1 other fieldsHigh correlation
BILL_AMT4 is highly correlated with BILL_AMT3 and 2 other fieldsHigh correlation
BILL_AMT5 is highly correlated with BILL_AMT4 and 1 other fieldsHigh correlation
BILL_AMT6 is highly correlated with BILL_AMT4 and 1 other fieldsHigh correlation
PAY_AMT2 is highly skewed (γ1 = 30.58709063) Skewed
ID has unique values Unique
PAY_0 has 9765 (48.8%) zeros Zeros
PAY_2 has 10448 (52.2%) zeros Zeros
PAY_3 has 10489 (52.4%) zeros Zeros
PAY_4 has 11148 (55.7%) zeros Zeros
PAY_5 has 11272 (56.4%) zeros Zeros
PAY_6 has 10646 (53.2%) zeros Zeros
BILL_AMT1 has 1330 (6.7%) zeros Zeros
BILL_AMT2 has 1697 (8.5%) zeros Zeros
BILL_AMT3 has 1957 (9.8%) zeros Zeros
BILL_AMT4 has 2122 (10.6%) zeros Zeros
BILL_AMT5 has 2373 (11.9%) zeros Zeros
BILL_AMT6 has 2749 (13.7%) zeros Zeros
PAY_AMT1 has 3597 (18.0%) zeros Zeros
PAY_AMT2 has 3724 (18.6%) zeros Zeros
PAY_AMT3 has 4129 (20.6%) zeros Zeros
PAY_AMT4 has 4407 (22.0%) zeros Zeros
PAY_AMT5 has 4544 (22.7%) zeros Zeros
PAY_AMT6 has 4940 (24.7%) zeros Zeros

Variables

ID
Real number (ℝ≥0)

UNIQUE

Distinct count20000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10000.5
Minimum1
Maximum20000
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB

Quantile statistics

Minimum1
5-th percentile1000.95
Q15000.75
median10000.5
Q315000.25
95-th percentile19000.05
Maximum20000
Range19999
Interquartile range (IQR)9999.5

Descriptive statistics

Standard deviation5773.647028
Coefficient of variation (CV)0.577335836
Kurtosis-1.2
Mean10000.5
Median Absolute Deviation (MAD)5000
Skewness0
Sum200010000
Variance33335000
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
109121< 0.1%
 
129471< 0.1%
 
27081< 0.1%
 
6611< 0.1%
 
68061< 0.1%
 
47591< 0.1%
 
191001< 0.1%
 
170531< 0.1%
 
88651< 0.1%
 
Other values (19990)19990> 99.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
200001< 0.1%
 
199991< 0.1%
 
199981< 0.1%
 
199971< 0.1%
 
199961< 0.1%
 

LIMIT_BAL
Real number (ℝ≥0)

Distinct count76
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean163301.184
Minimum10000.0
Maximum1000000.0
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median130000
Q3230000
95-th percentile420000
Maximum1000000
Range990000
Interquartile range (IQR)180000

Descriptive statistics

Standard deviation128746.7033
Coefficient of variation (CV)0.7884003049
Kurtosis0.6077365416
Mean163301.184
Median Absolute Deviation (MAD)80000
Skewness1.029815015
Sum3266023680
Variance1.65757136e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
50000235411.8%
 
2000013556.8%
 
3000011755.9%
 
8000010635.3%
 
2000009905.0%
 
1500007313.7%
 
1000006993.5%
 
1800006533.3%
 
3600005742.9%
 
600005672.8%
 
Other values (66)983949.2%
 
ValueCountFrequency (%) 
100003391.7%
 
160001< 0.1%
 
2000013556.8%
 
3000011755.9%
 
400001540.8%
 
ValueCountFrequency (%) 
10000001< 0.1%
 
8000002< 0.1%
 
7500004< 0.1%
 
7400001< 0.1%
 
7200001< 0.1%
 

SEX
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2
12281
1
7719
ValueCountFrequency (%) 
21228161.4%
 
1771938.6%
 

Length

Max length1
Median length1
Mean length1
Min length1

EDUCATION
Real number (ℝ≥0)

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.83695
Minimum0
Maximum6
Zeros9
Zeros (%)< 0.1%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q32
95-th percentile3
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7695416215
Coefficient of variation (CV)0.4189235534
Kurtosis1.878845998
Mean1.83695
Median Absolute Deviation (MAD)1
Skewness0.9010035439
Sum36739
Variance0.5921943072
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2945147.3%
 
1711335.6%
 
3319116.0%
 
51510.8%
 
4570.3%
 
6280.1%
 
09< 0.1%
 
ValueCountFrequency (%) 
09< 0.1%
 
1711335.6%
 
2945147.3%
 
3319116.0%
 
4570.3%
 
ValueCountFrequency (%) 
6280.1%
 
51510.8%
 
4570.3%
 
3319116.0%
 
2945147.3%
 

MARRIAGE
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2
10702
1
9033
3
 
232
0
 
33
ValueCountFrequency (%) 
21070253.5%
 
1903345.2%
 
32321.2%
 
0330.2%
 

Length

Max length1
Median length1
Mean length1
Min length1

AGE
Real number (ℝ≥0)

Distinct count55
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.33325
Minimum21
Maximum79
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.210658839
Coefficient of variation (CV)0.2606796386
Kurtosis0.06718029386
Mean35.33325
Median Absolute Deviation (MAD)6
Skewness0.7401311192
Sum706665
Variance84.83623625
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2910685.3%
 
279744.9%
 
289434.7%
 
309104.5%
 
268714.4%
 
258144.1%
 
247863.9%
 
337763.9%
 
347673.8%
 
317663.8%
 
Other values (45)1132556.6%
 
ValueCountFrequency (%) 
21460.2%
 
224102.1%
 
236613.3%
 
247863.9%
 
258144.1%
 
ValueCountFrequency (%) 
791< 0.1%
 
751< 0.1%
 
732< 0.1%
 
721< 0.1%
 
712< 0.1%
 

PAY_0
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02145
Minimum-2
Maximum8
Zeros9765
Zeros (%)48.8%
Memory size156.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.121094439
Coefficient of variation (CV)52.26547499
Kurtosis3.21004583
Mean0.02145
Median Absolute Deviation (MAD)1
Skewness0.8208641883
Sum429
Variance1.25685274
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0976548.8%
 
-1388619.4%
 
1256912.8%
 
218909.4%
 
-215857.9%
 
32021.0%
 
4580.3%
 
8170.1%
 
5130.1%
 
68< 0.1%
 
ValueCountFrequency (%) 
-215857.9%
 
-1388619.4%
 
0976548.8%
 
1256912.8%
 
218909.4%
 
ValueCountFrequency (%) 
8170.1%
 
77< 0.1%
 
68< 0.1%
 
5130.1%
 
4580.3%
 

PAY_2
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1042
Minimum-2
Maximum8
Zeros10448
Zeros (%)52.2%
Memory size156.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.204124103
Coefficient of variation (CV)-11.5558935
Kurtosis1.730356002
Mean-0.1042
Median Absolute Deviation (MAD)0
Skewness0.8319999703
Sum-2084
Variance1.449914856
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01044852.2%
 
-1411420.6%
 
2274213.7%
 
-2234111.7%
 
32281.1%
 
4610.3%
 
5200.1%
 
1190.1%
 
7170.1%
 
69< 0.1%
 
ValueCountFrequency (%) 
-2234111.7%
 
-1411420.6%
 
01044852.2%
 
1190.1%
 
2274213.7%
 
ValueCountFrequency (%) 
81< 0.1%
 
7170.1%
 
69< 0.1%
 
5200.1%
 
4610.3%
 

PAY_3
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1363
Minimum-2
Maximum8
Zeros10489
Zeros (%)52.4%
Memory size156.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.210659157
Coefficient of variation (CV)-8.882312231
Kurtosis2.532713318
Mean-0.1363
Median Absolute Deviation (MAD)0
Skewness0.935031521
Sum-2726
Variance1.465695595
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01048952.4%
 
-1403220.2%
 
2265513.3%
 
-2254712.7%
 
31500.8%
 
4590.3%
 
7270.1%
 
6200.1%
 
5150.1%
 
14< 0.1%
 
ValueCountFrequency (%) 
-2254712.7%
 
-1403220.2%
 
01048952.4%
 
14< 0.1%
 
2265513.3%
 
ValueCountFrequency (%) 
82< 0.1%
 
7270.1%
 
6200.1%
 
5150.1%
 
4590.3%
 

PAY_4
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.19735
Minimum-2
Maximum8
Zeros11148
Zeros (%)55.7%
Memory size156.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.16806301
Coefficient of variation (CV)-5.918738334
Kurtosis3.857599321
Mean-0.19735
Median Absolute Deviation (MAD)0
Skewness1.066045653
Sum-3947
Variance1.364371196
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01114855.7%
 
-1377218.9%
 
-2272213.6%
 
2209510.5%
 
31350.7%
 
4510.3%
 
7410.2%
 
5270.1%
 
65< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
-2272213.6%
 
-1377218.9%
 
01114855.7%
 
12< 0.1%
 
2209510.5%
 
ValueCountFrequency (%) 
82< 0.1%
 
7410.2%
 
65< 0.1%
 
5270.1%
 
4510.3%
 

PAY_5
Real number (ℝ)

ZEROS

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2339
Minimum-2
Maximum8
Zeros11272
Zeros (%)56.4%
Memory size156.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.142478032
Coefficient of variation (CV)-4.884472132
Kurtosis4.002151329
Mean-0.2339
Median Absolute Deviation (MAD)0
Skewness1.043625791
Sum-4678
Variance1.305256053
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01127256.4%
 
-1377418.9%
 
-2283014.1%
 
218769.4%
 
31300.7%
 
4640.3%
 
7410.2%
 
59< 0.1%
 
63< 0.1%
 
81< 0.1%
 
ValueCountFrequency (%) 
-2283014.1%
 
-1377418.9%
 
01127256.4%
 
218769.4%
 
31300.7%
 
ValueCountFrequency (%) 
81< 0.1%
 
7410.2%
 
63< 0.1%
 
59< 0.1%
 
4640.3%
 

PAY_6
Real number (ℝ)

ZEROS

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2614
Minimum-2
Maximum8
Zeros10646
Zeros (%)53.2%
Memory size156.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.167063898
Coefficient of variation (CV)-4.464666786
Kurtosis3.378433112
Mean-0.2614
Median Absolute Deviation (MAD)0
Skewness0.9887823395
Sum-5228
Variance1.362038142
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01064653.2%
 
-1404120.2%
 
-2306915.3%
 
2201410.1%
 
31400.7%
 
7320.2%
 
4320.2%
 
6150.1%
 
59< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
-2306915.3%
 
-1404120.2%
 
01064653.2%
 
2201410.1%
 
31400.7%
 
ValueCountFrequency (%) 
82< 0.1%
 
7320.2%
 
6150.1%
 
59< 0.1%
 
4320.2%
 

BILL_AMT1
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count15918
Unique (%)79.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50022.29625
Minimum-165580.0
Maximum964511.0
Zeros1330
Zeros (%)6.7%
Memory size156.2 KiB

Quantile statistics

Minimum-165580
5-th percentile0
Q13688.25
median22541
Q365061.75
95-th percentile194282
Maximum964511
Range1130091
Interquartile range (IQR)61373.5

Descriptive statistics

Standard deviation71498.06441
Coefficient of variation (CV)1.429323917
Kurtosis10.32326362
Mean50022.29625
Median Absolute Deviation (MAD)21845
Skewness2.704989067
Sum1000445925
Variance5111973214
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
013306.7%
 
3901520.8%
 
780580.3%
 
316510.3%
 
326500.2%
 
2500370.2%
 
396280.1%
 
2400270.1%
 
1050190.1%
 
-200190.1%
 
Other values (15908)1822991.1%
 
ValueCountFrequency (%) 
-1655801< 0.1%
 
-153081< 0.1%
 
-143861< 0.1%
 
-98021< 0.1%
 
-90951< 0.1%
 
ValueCountFrequency (%) 
9645111< 0.1%
 
6304581< 0.1%
 
6217491< 0.1%
 
6107231< 0.1%
 
6040191< 0.1%
 

BILL_AMT2
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count15638
Unique (%)78.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48149.92915
Minimum-33350.0
Maximum983931.0
Zeros1697
Zeros (%)8.5%
Memory size156.2 KiB

Quantile statistics

Minimum-33350
5-th percentile0
Q13147
median21532
Q362435.75
95-th percentile187924.35
Maximum983931
Range1017281
Interquartile range (IQR)59288.75

Descriptive statistics

Standard deviation69443.17584
Coefficient of variation (CV)1.442227996
Kurtosis11.32318576
Mean48149.92915
Median Absolute Deviation (MAD)21136
Skewness2.785568925
Sum962998583
Variance4822354671
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
016978.5%
 
3901440.7%
 
316580.3%
 
326520.3%
 
780460.2%
 
2500340.2%
 
2400290.1%
 
396280.1%
 
-200230.1%
 
1050200.1%
 
Other values (15628)1786989.3%
 
ValueCountFrequency (%) 
-333501< 0.1%
 
-300001< 0.1%
 
-262141< 0.1%
 
-247041< 0.1%
 
-247021< 0.1%
 
ValueCountFrequency (%) 
9839311< 0.1%
 
7439701< 0.1%
 
6467701< 0.1%
 
6059431< 0.1%
 
5977931< 0.1%
 

BILL_AMT3
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count15384
Unique (%)76.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45728.7874
Minimum-157264.0
Maximum1664089.0
Zeros1957
Zeros (%)9.8%
Memory size156.2 KiB

Quantile statistics

Minimum-157264
5-th percentile0
Q12720
median20160
Q358668.5
95-th percentile179964.75
Maximum1664089
Range1821353
Interquartile range (IQR)55948.5

Descriptive statistics

Standard deviation67151.05479
Coefficient of variation (CV)1.468463491
Kurtosis25.96669101
Mean45728.7874
Median Absolute Deviation (MAD)19770
Skewness3.285701752
Sum914575748
Variance4509264160
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
019579.8%
 
3901760.9%
 
780510.3%
 
316510.3%
 
326440.2%
 
396280.1%
 
2500260.1%
 
2400260.1%
 
200220.1%
 
416180.1%
 
Other values (15374)1760188.0%
 
ValueCountFrequency (%) 
-1572641< 0.1%
 
-615061< 0.1%
 
-340411< 0.1%
 
-203201< 0.1%
 
-159101< 0.1%
 
ValueCountFrequency (%) 
16640891< 0.1%
 
6931311< 0.1%
 
5974151< 0.1%
 
5789711< 0.1%
 
5480201< 0.1%
 

BILL_AMT4
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count15059
Unique (%)75.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41465.52835
Minimum-170000.0
Maximum891586.0
Zeros2122
Zeros (%)10.6%
Memory size156.2 KiB

Quantile statistics

Minimum-170000
5-th percentile0
Q12309.5
median18889.5
Q351123.5
95-th percentile166661.95
Maximum891586
Range1061586
Interquartile range (IQR)48814

Descriptive statistics

Standard deviation61660.90664
Coefficient of variation (CV)1.487040177
Kurtosis11.98792725
Mean41465.52835
Median Absolute Deviation (MAD)18326.5
Skewness2.872809794
Sum829310567
Variance3802067407
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0212210.6%
 
3901620.8%
 
780680.3%
 
316520.3%
 
326440.2%
 
150320.2%
 
396270.1%
 
2400270.1%
 
2500250.1%
 
416220.1%
 
Other values (15049)1741987.1%
 
ValueCountFrequency (%) 
-1700001< 0.1%
 
-813341< 0.1%
 
-345031< 0.1%
 
-243031< 0.1%
 
-203201< 0.1%
 
ValueCountFrequency (%) 
8915861< 0.1%
 
6286991< 0.1%
 
5690341< 0.1%
 
5426531< 0.1%
 
5306721< 0.1%
 

BILL_AMT5
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count14727
Unique (%)73.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39526.67165
Minimum-37594.0
Maximum927171.0
Zeros2373
Zeros (%)11.9%
Memory size156.2 KiB

Quantile statistics

Minimum-37594
5-th percentile0
Q11718.75
median18132
Q349529.25
95-th percentile160874
Maximum927171
Range964765
Interquartile range (IQR)47810.5

Descriptive statistics

Standard deviation59309.32739
Coefficient of variation (CV)1.500488782
Kurtosis12.42749486
Mean39526.67165
Median Absolute Deviation (MAD)17713.5
Skewness2.8824658
Sum790533433
Variance3517596315
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0237311.9%
 
3901650.8%
 
316620.3%
 
780570.3%
 
326440.2%
 
150390.2%
 
396280.1%
 
2400270.1%
 
416240.1%
 
2500230.1%
 
Other values (14717)1715885.8%
 
ValueCountFrequency (%) 
-375941< 0.1%
 
-361561< 0.1%
 
-283351< 0.1%
 
-230031< 0.1%
 
-207531< 0.1%
 
ValueCountFrequency (%) 
9271711< 0.1%
 
5517021< 0.1%
 
5478801< 0.1%
 
5054731< 0.1%
 
5039141< 0.1%
 

BILL_AMT6
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count14339
Unique (%)71.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38175.69155
Minimum-339603.0
Maximum961664.0
Zeros2749
Zeros (%)13.7%
Memory size156.2 KiB

Quantile statistics

Minimum-339603
5-th percentile0
Q11193.75
median16995.5
Q348672.25
95-th percentile157758.05
Maximum961664
Range1301267
Interquartile range (IQR)47478.5

Descriptive statistics

Standard deviation58707.21876
Coefficient of variation (CV)1.537816772
Kurtosis13.82804934
Mean38175.69155
Median Absolute Deviation (MAD)16679.5
Skewness2.945456668
Sum763513831
Variance3446537534
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0274913.7%
 
3901330.7%
 
316650.3%
 
150590.3%
 
780560.3%
 
326400.2%
 
396280.1%
 
-18220.1%
 
2400220.1%
 
416220.1%
 
Other values (14329)1680484.0%
 
ValueCountFrequency (%) 
-3396031< 0.1%
 
-1509531< 0.1%
 
-514431< 0.1%
 
-511831< 0.1%
 
-457341< 0.1%
 
ValueCountFrequency (%) 
9616641< 0.1%
 
6999441< 0.1%
 
5686381< 0.1%
 
5277111< 0.1%
 
5275661< 0.1%
 

PAY_AMT1
Real number (ℝ≥0)

ZEROS

Distinct count6067
Unique (%)30.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5521.0682
Minimum0.0
Maximum505000.0
Zeros3597
Zeros (%)18.0%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1967.75
median2084
Q35000
95-th percentile18163.55
Maximum505000
Range505000
Interquartile range (IQR)4032.25

Descriptive statistics

Standard deviation15250.37482
Coefficient of variation (CV)2.762214534
Kurtosis192.6437216
Mean5521.0682
Median Absolute Deviation (MAD)1916.5
Skewness11.10137038
Sum110421364
Variance232573932.2
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0359718.0%
 
20009094.5%
 
30005732.9%
 
50004652.3%
 
15003531.8%
 
40003001.5%
 
10002651.3%
 
100002581.3%
 
25001991.0%
 
60001840.9%
 
Other values (6057)1289764.5%
 
ValueCountFrequency (%) 
0359718.0%
 
16< 0.1%
 
2120.1%
 
38< 0.1%
 
4100.1%
 
ValueCountFrequency (%) 
5050001< 0.1%
 
4050161< 0.1%
 
3681991< 0.1%
 
3020001< 0.1%
 
3000001< 0.1%
 

PAY_AMT2
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count5922
Unique (%)29.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5746.19355
Minimum0.0
Maximum1684259.0
Zeros3724
Zeros (%)18.6%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1740.75
median2000
Q35000
95-th percentile18667.25
Maximum1684259
Range1684259
Interquartile range (IQR)4259.25

Descriptive statistics

Standard deviation21518.62324
Coefficient of variation (CV)3.744848315
Kurtosis1948.43083
Mean5746.19355
Median Absolute Deviation (MAD)1941
Skewness30.58709063
Sum114923871
Variance463051145.9
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0372418.6%
 
20008924.5%
 
30005712.9%
 
50004842.4%
 
10004532.3%
 
15003821.9%
 
40002631.3%
 
100002021.0%
 
60001870.9%
 
12001730.9%
 
Other values (5912)1266963.3%
 
ValueCountFrequency (%) 
0372418.6%
 
19< 0.1%
 
2150.1%
 
3150.1%
 
47< 0.1%
 
ValueCountFrequency (%) 
16842591< 0.1%
 
5804641< 0.1%
 
4155521< 0.1%
 
4010031< 0.1%
 
3852281< 0.1%
 

PAY_AMT3
Real number (ℝ≥0)

ZEROS

Distinct count5500
Unique (%)27.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4734.48815
Minimum0.0
Maximum896040.0
Zeros4129
Zeros (%)20.6%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1322
median1593
Q34054.5
95-th percentile15304.8
Maximum896040
Range896040
Interquartile range (IQR)3732.5

Descriptive statistics

Standard deviation15823.31417
Coefficient of variation (CV)3.342138299
Kurtosis639.0253309
Mean4734.48815
Median Absolute Deviation (MAD)1593
Skewness17.74716555
Sum94689763
Variance250377271.3
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0412920.6%
 
10008374.2%
 
20008264.1%
 
30005592.8%
 
50005112.6%
 
15003041.5%
 
40002491.2%
 
100002141.1%
 
25001600.8%
 
60001580.8%
 
Other values (5490)1205360.3%
 
ValueCountFrequency (%) 
0412920.6%
 
19< 0.1%
 
2170.1%
 
3100.1%
 
4120.1%
 
ValueCountFrequency (%) 
8960401< 0.1%
 
4175881< 0.1%
 
3717181< 0.1%
 
3383941< 0.1%
 
3328091< 0.1%
 

PAY_AMT4
Real number (ℝ≥0)

ZEROS

Distinct count5293
Unique (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4725.79775
Minimum0.0
Maximum497000.0
Zeros4407
Zeros (%)22.0%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1237.75
median1496.5
Q34000
95-th percentile15847.15
Maximum497000
Range497000
Interquartile range (IQR)3762.25

Descriptive statistics

Standard deviation15180.46154
Coefficient of variation (CV)3.212253749
Kurtosis213.8612252
Mean4725.79775
Median Absolute Deviation (MAD)1496.5
Skewness11.72346383
Sum94515955
Variance230446412.6
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0440722.0%
 
10009294.6%
 
20008004.0%
 
30005943.0%
 
50005452.7%
 
15003011.5%
 
40002741.4%
 
100002141.1%
 
5001830.9%
 
60001760.9%
 
Other values (5283)1157757.9%
 
ValueCountFrequency (%) 
0440722.0%
 
1160.1%
 
2130.1%
 
3110.1%
 
4130.1%
 
ValueCountFrequency (%) 
4970001< 0.1%
 
4321301< 0.1%
 
4000461< 0.1%
 
3317881< 0.1%
 
3309821< 0.1%
 

PAY_AMT5
Real number (ℝ≥0)

ZEROS

Distinct count5248
Unique (%)26.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4758.7926
Minimum0.0
Maximum417990.0
Zeros4544
Zeros (%)22.7%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1216
median1500
Q34000
95-th percentile15794.75
Maximum417990
Range417990
Interquartile range (IQR)3784

Descriptive statistics

Standard deviation15447.36965
Coefficient of variation (CV)3.246069108
Kurtosis177.201995
Mean4758.7926
Median Absolute Deviation (MAD)1500
Skewness11.1072561
Sum95175852
Variance238621229.1
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0454422.7%
 
10008894.4%
 
20008614.3%
 
30006233.1%
 
50005512.8%
 
15002901.5%
 
40002591.3%
 
100002081.0%
 
5001670.8%
 
25001530.8%
 
Other values (5238)1145557.3%
 
ValueCountFrequency (%) 
0454422.7%
 
1150.1%
 
28< 0.1%
 
37< 0.1%
 
48< 0.1%
 
ValueCountFrequency (%) 
4179901< 0.1%
 
3880711< 0.1%
 
3792671< 0.1%
 
3320001< 0.1%
 
3309821< 0.1%
 

PAY_AMT6
Real number (ℝ≥0)

ZEROS

Distinct count5227
Unique (%)26.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5080.15935
Minimum0.0
Maximum528666.0
Zeros4940
Zeros (%)24.7%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q110
median1407
Q34000
95-th percentile17398.1
Maximum528666
Range528666
Interquartile range (IQR)3990

Descriptive statistics

Standard deviation17306.82153
Coefficient of variation (CV)3.406747769
Kurtosis176.7194765
Mean5080.15935
Median Absolute Deviation (MAD)1407
Skewness10.69589634
Sum101603187
Variance299526071.6
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0494024.7%
 
10008734.4%
 
20008624.3%
 
30005752.9%
 
50005392.7%
 
15003191.6%
 
40002821.4%
 
100002341.2%
 
5001740.9%
 
60001530.8%
 
Other values (5217)1104955.2%
 
ValueCountFrequency (%) 
0494024.7%
 
1150.1%
 
26< 0.1%
 
3100.1%
 
47< 0.1%
 
ValueCountFrequency (%) 
5286661< 0.1%
 
5271431< 0.1%
 
4035001< 0.1%
 
3724951< 0.1%
 
3452931< 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
15442
1
4558
ValueCountFrequency (%) 
01544277.2%
 
1455822.8%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default.payment.next.month
0120000.02212422-1-1-2-23913.03102.0689.00.00.00.00.0689.00.00.00.00.01
12120000.022226-1200022682.01725.02682.03272.03455.03261.00.01000.01000.01000.00.02000.01
2390000.02223400000029239.014027.013559.014331.014948.015549.01518.01500.01000.01000.01000.05000.00
3450000.02213700000046990.048233.049291.028314.028959.029547.02000.02019.01200.01100.01069.01000.00
4550000.012157-10-10008617.05670.035835.020940.019146.019131.02000.036681.010000.09000.0689.0679.00
5650000.01123700000064400.057069.057608.019394.019619.020024.02500.01815.0657.01000.01000.0800.00
67500000.011229000000367965.0412023.0445007.0542653.0483003.0473944.055000.040000.038000.020239.013750.013770.00
78100000.0222230-1-100-111876.0380.0601.0221.0-159.0567.0380.0601.00.0581.01687.01542.00
89140000.02312800200011285.014096.012108.012211.011793.03719.03329.00.0432.01000.01000.01000.00
91020000.013235-2-2-2-2-1-10.00.00.00.013007.013912.00.00.00.013007.01122.00.00

Last rows

IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default.payment.next.month
1999019991150000.022132-12-10003204.0430.029425.030437.031123.028997.00.029425.02000.01528.01518.02000.00
199911999270000.02213312000051900.049307.048214.027572.026810.020541.00.01500.01300.01505.01000.01000.00
1999219993240000.022134-1-100-1-1626.01921.020740.021274.0888.0360.01921.019000.02624.0888.0360.0360.00
1999319994250000.021149-1-1-2-2-101104.00.00.00.03000.01500.00.00.00.03000.00.03212.00
1999419995440000.022141000000348397.0356586.0366049.0262697.0267922.0274502.014006.019077.09518.09576.011083.012010.00
1999519996130000.022140000000133559.0129869.0118032.095953.073970.0107785.05400.06950.04600.04000.02000.02300.00
199961999760000.02213700000059462.060866.054007.052089.029397.029110.03000.02570.02202.01200.01100.01100.01
1999719998290000.022141-1-1-2-10-12025.00.00.09194.09194.0399.00.00.09194.00.0399.09290.00
1999819999150000.02214100-1-1-1-14474.03881.01207.01617.00.0620.03000.02306.02610.00.0620.00.00
1999920000240000.022137-12-1-1-101769.0842.014015.00.01317.0566.00.014015.00.01317.00.00.00